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Abstract. This paper considers the inversion of ill-posed linear operators. To 
regularise the problem the solution is enforced to lie in a non-convex subset. Theoretical 
properties for the stable inversion are derived and an iterative algorithm akin to the 
projected Landweber algorithm is studied. This work extends recent progress made on 
the efficient inversion of finite dimensional linear systems under a sparsity constraint 
to the Hilbcrt space setting and to more general non-convex constraints. 
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1. Introduction 

Let T be a linear map between two Hilbert spaces H and L. For / G H it is often 
required to recover / given Tf. Furthermore, this inversion should be stable to small 
perturbations, that is, given 

9 = Tf + e, (1) 

where g G L, e G L and / G H are elements in their respective Hilbert spaces, the 
estimate of / given g should linearly depend on the size of the error e. In many 
cases of interest, the operator T is ill-conditioned or non- invert ible and some form of 
regularisation is required p]. One possible approach to regualrisation is to assume that 
/ lies in or close to a convex subset A C H. Under certain conditions, stable inversion 
is then possible. We here extend this approach to more general non-convex sets A C H . 
We define and study an 'optimal' solution to the inverse problem and propose a projected 
Landweber algorithm to efficiently calculate a solution with similar error bounds than 
those derived for the 'optimal' solution. 

Our treatment is motivated by recent work in signal processing, where sparse signal 
models have gained increasing interest. It has been recognized that (under certain 
conditions) many finite dimensional inverse problems can be solved efficiently whenever 
the solution is sparse enough in some basis, that is, whenever the solution admits a 
representation in which most of its elements are zero or negligibly small [2], [3], [I] and 
0- 

In sparse inverse problems, where / G M. N and g G 1R M , we assumed / to have only 
K « N non-zero elements. In the noiseless setting, that is if e = 0, it is then necessary 
to identify that point / in the affine subspace defined by Tf, that has the fewest non- 
zero elements. The sparse inverse problem is a constrained inverse problem in which 
the constraint set is non-convex. More precisely, the constraint set is the union of (^) 
i^-dimensionl subspaces, where each subspaces is spanned by a different combination of 
K canonical basis vectors. More general union of subspaces can be considered [6] and 

m 

Solving sparse inverse problems is known to be NP-hard in general [8]. However, 
under certain conditions it could be shown that polynomial time algorithms can find 
near optimal solutions [2], [3], pE] and [5]. We here extend these ideas to linear inverse 
problems in more general Hilbert spaces and study general non-convex constraint sets. 

Given (ITJ) and assuming that T is ill-conditioned or non- invert ible, we assume that 
/ lies in a set A, where A C H . Importantly, we allow the set A to be non-convex in 
general. 

In this general set-up, three important questions arise. 

(i) Assuming / G A and e = 0, under which conditions can we recover / from Tf? 

(ii) Under which conditions is this inversion stable to small perturbations e, that is, 
under which conditions can we guarantee that the inverse of Tf + e is 'close' to the 
inverse of Tf? 
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(iii) How and under which conditions can we calculate the inversion efficiently? 

The answer to the first question is that we require T to be one to one as a map 
from A to L. This is clearly necessary and sufficient for the recovery of / G A from Tf. 

If .A is a compact subset of Euclidean space, then we can lower bound the dimension 
of L using Whitney's Embedding Theorem [9J chapter 10] and its probabilistic version 
by Sauer et al. [10] that states that 

Theorem 1 (Theorem 2.3 [10]). Assume a compact subset A of R with box counting 
dimension k and let M > 2k, then almost all smooth maps F from R N to M M have the 
property that F is one to one on A. 

Importantly, the above result also holds for almost all linear embeddings. In this 
case, the result can easily be generalized to non-compact subsets of ~R N by considering 
the box counting dimension of the projection of the set A \ onto the unit sphere. 

However, the one to one property does not tell use anything about stability of the 
inverse, neither about how we would go about calculating such an inverse. 

For / G A, in order to guarantee stability to perturbations e, it is necessary to 
impose a bi-Lipschitz condition on T as a map from A C H to TA C L, that is, we 
require that there exist constants < a < (3, such that for all fi, f% G A we have 

a||A + / 2 f< ||T(/ x + / 2 )|| 2 </9||/i + / a || 2 . (2) 

The existence of such maps has been studied in |llj . 

Note that this condition guarantees that T is one to one as a function from A to 
TA. We are therefore able, at least in theory, to invert T. The condition also guarantees 
stability in that if we are given an observations g = Tf + e = T fj± + T(f — fX) + e, 
where fjs, G A, then we could, at least in theory, recover a good approximation of / 
by first choosing g to be the 'projection' (see below how this can be defined for general 
subsets of Hilbert space) of g onto a close element in TA and then taking / G A so that 
g = Tf. We show below that the bi-Lipschitz property of T guarantees that / is close 
to /, where the distance depends on T(f — f\) + e. Note however that the bi-Lipschitz 
constant does not bound T(f — fX) in general, so that we also require T to be bounded, 
i.e. continuous. 

Unfortunately, how the above recovery of / is to be done efficiently for general sets 
A is not clear. We here propose an algorithm and show that under certain conditions on 
a and (3 we can efficiently calculate near optimal solutions. In order for our algorithm 
to be useful, we require that we are able to efficiently calculate the 'projection' of any 
/ onto A. 

In inverse problems, the operator T is generally given. However, we have to choose 
A. On the one hand, we want to choose A in accordance with prior knowledge about 
reasonable solutions for a given problem. On the other hand, the stability considerations 
discussed above as well as the requirements in our theory developed below, show that 
we want the set A be chosen so that T is bi-Lipschitz as a map from A to L. 
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2. An 'optimal' solution 

Let us here formalize the solution to the general inversion problem (OQ), under the 
constraint that / G A as outlined above. We first define what we mean by a projection 
onto a general-non-convex set. For any g G L, consider 

)nf \\g -Tff. (3) 

If A is non-empty, then by the properties of the infirmum and the fact that the norm is 
bounded from below, the above infirmum exists and is finite. However, there might not 
exist an / G A such that \\g — Tf\\ 2 = inf/ e _4 \\g — Tf\\ 2 . In this case, we will consider 
estimates / that are close to this infirmum. 

The definition of the infirmum and the fact that infy g _4 \\g — Tf\\ 2 < oo prove the 
following trivial fact. 

Lemma 2. Let A be a nonempty closed subset of a Hilbert space H . Let T be a linear 
operator form H into a Hilbert space L, then for all e > and g G L, there exist an 
element f G A for which 

\\g-Tf\\<M\\g-Tf\\ + e. (4) 

We can therefore define the set-valued mapping 

™ A {g) = {f:\\g- Tf\\ 2 < inf \\g - Tf\\ 2 + e}. (5) 

By the above lemma, the sets rn e ^(f) are non-empty for all e > 0, so that we can define 
e-optimal estimates as those points 

Fopt G ™\{f). (6) 
Similarly, we can define the set-valued mapping 

£(/) = {/: 11/ -/T< Inf 11/ -/|| 2 + 6}. (7) 

feA 

and the e-projection 

Pl(f) = S( P e A (f)), (8) 

where S is a selection operator that returns a single element from a set. The form of 
this operator is of no consequence for the rest of the discussion as long as it returns a 
single element. 

We can now define e = g — Tf^ pt , = P^(f) and g — Tf b A = e. We have as a 
direct consequence of the definition of f* pt that 

\\g - Tfl pt \\ = \\e\\ < \\g -Tf 5 J + ^= ||e|| + y/i. (9) 

To bound the error of estimates f* t we use the following lemma 

Lemma 3. If T is a bi-Lipschitz map from a nonempty set A to L with bi-Lipschitz 
constants a, (3 > 0, then 

\\f S A -r opt \\<^=[2\\e\\ + ^\ (10) 
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\\f A -fo P t\\<^\\Tf A -Tr opt \ 



<^\\\g-Tr opt \\ + \\g-Tf A 



Proof. 



1 

a 

(11) 

and the Theorem follows form 0. □ 
This result implies 

Theorem 4. IfT is a bounded linear operator from H to L that satisfies the bi-Lipschitz 
condition as a map from A C H to L with constants a, (3 > 0, then, for any 5, e > the 
estimate f^ pt satisfies the bound 



11/ " fo P t\\ < - f S A ) + e\\ + inf ||/ - /|| + + V~S. 



(12) 



Proof. Using [3] and the triangle inequality we have 

ll/-/^ll<i[2||e||+Vel + ||/-/il 



< 4= He|| + inf \\f-f\\ + ^= + VS (13) 



from which the theorem follows by the definition of e. □ 



3. The Iterative Projection Algorithm 

In the previous section we have studied the estimate 

fit = S(m\(g)), (14) 

and bounded the error ||/ — / op t|| if T is bi-Lipschitz from A to L. However, it is not clear 
how to calculate f opt for general sets A. We therefore propose an iterative algorithm in 
which the above optimization is replaced by a series of e-projections. Importantly, we 
show that, under certain conditions on the bi-Lipschitz constants a and /3, this algorithm 
has a similar error bound to that of f opt given above. 

The Iterative Projection Algorithm is a generalization of the Projected Landweber 
Iteration [T5] to non-convex sets and an extension of the Iterative Hard Thresholding 
algorithm of [13], [H] and [5] to more general constrained inverse problems. 

Given g and T, let /° = and e > e" > 0. The Iterative Projection Algorithm is 
the iterative procedure defined by the recursion 

P +1 = PZ(P + fiT*(g-TP)), (15) 

where T* is the adjoint of T. 

In many problems, calculation of P\{a) is much easier than a brute force search for 
flp t . For example, in the A'-sparse model, P%(a) simply keeps the largest (in magnitude) 
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K elements of a sequence a and sets the other elements to zero. The next result shows 
that under certain conditions, not only does the algorithm calculate solutions with 
an error guarantee similar to that of f op t, it does so in a fixed number of iterations 
(depending only on a form of signal to noise ratio). 
We have the following main result. 

Theorem 5. Given g = Tf + e where f 6 H . Let A C H be non-empty such that T is 
bi-Lipschitz as a map from A to L with constants a and (3 that satisfy (3 < - < 1.5a, 
then, after 



n 



ln(<J- 



2- llfAl1 



(16) 



ln(2/(/xa) -2) 

iterations, the Iterative Projection Algorithm calculates a solution f n * satisfying 

11/ - /"I < (c a5 + 8){\\e\\ + .f±) + \\f A - f\\. (17) 



2/i' 

where c < and e = T(f - f A ) + e. 

Note that this is of the same order as the bound for f^ pt . 

The above theorem has been proved for the i^-sparse model in [5] and for constraint 
sparse models in |15j . Our main contribution is to show that it holds for general 
constrained inverse problems, as long as the bi-Lipschitz property holds with appropriate 
constants. To derive the result, we pursue a slightly different approach to that in [5] 
and [15] and instead follow ideas of [IE] . The approach used here is basically that used 
for union of subspace models in [17] . 

We first establish the following lemma. 

Lemma 6. Let r = 2T*(g - T f n ) and f n+1 = P% (f n + nT*(g - Tf n )). If ± > (5 then 

||^_ T jn+l||2 _ ||^_ T jn||2 



1 f n 

< - Re((f A - n,r) + -\\f A - rw 2 + - (IS 



- Re((f-r),r) + -\\(f-r 

1 ril t in A t „l|2 /,, /n\2||„||2l 



Proof. We have 



l 1 1 / n -^|| 2 -W2f||r|| 2 ], (19) 

so that on the one hand 

|b-T/ n+1 || 2 - ||^-T/ n || 2 
= -Re((r+ l -P),r) + \\T(P+ 1 -n\\ 2 

< _ jRe((r +l_ r);r) + I|| (r+ l_ r) ||2 

A 4 
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in f-ite<(/-/») ) r> + i||(/-r)l| a 
feA n 

= ^)^ ll/ " r -f r||2 - ( ' u/2)2||r||2] 
>^[||r +1 -r-frf-W2) 2 ||rf-6] 

= - Re((r +i - n, r ) + -uf n+1 - nir - 

[I fl 

where the inequality comes from the definition of P A 

We have thus shown that f n+1 = P A (f n + fr) implies 

-Re{(r+ i -r),r) + -\\(f-f n )\\ 2 

/i 

<-Re((f-r),r) + -\\(f-r)f + - 

fl [I 

for all / G A. Because f A G A, this implies that 

-Re((f A -r),r) + -\\(f A -r)\\ 2 + - 

> - Re ((r+i - n, r ) + - mi 2 . 



Proof of Theorem^ Using = P A (f) 

Wf-f n+1 \\<\\fA-f n+1 \\ + \\fA-fl 

where the bi-Lipschitz property implies that 

n^-/ n+1 ir<-iir(/^-r +i )ii 2 . 

a 

Furthermore 

im/^-/ n+1 )ii 2 = ib-r/ n+i -ei 2 

= \\9- Tf n+1 \\ 2 + ||e|| 2 - 2Re(e, (g - Tf n+1 )) 

< ii^-T/ n+i ii 2 + i|gf + i|gf + ii^-Tr +i n 2 

= 2||^-rr +1 || 2 + 2||e1 2 ) 
where the last inequality follows from 
-2Re(e, (g-Tf n+1 )) 
= - ||e + (g - Tr +1 )|| 2 + ||e~|| 2 + \\(g- T/- +1 )|| 2 

< ||el 2 +||(^-Tr +1 )|| 2 . 

We will now show that under the Lipschitz assumption of the theorem, 

ii^-rr +i ii 2 <(i -«)ii(^-r)n 2 +iie|| 2 . 
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\\g-Tf n+1 \\ 2 - \\g -Tf n \\ 2 

< -Re((fA-f n ),r) + -\\fA-f n \\ 2 + - 

fi fi 

= - Re((f A ~ f n ),r) + a\\f A - f n f + (- - a)\\f A - f n \\ 2 + - 

fi fi 

< - Re{(f A - r),r) + \\T(f A - Df + (- - a)\\f A - f n \\ 2 + ' 



fi 

\9 - Tf A \\ 2 -\\g- TP\\ 2 + (-- a)\\f A - f n \\ 2 + ' 



fi 

|e|| 2 -\\g- rr|| 2 + (-- a)\\(f A - /")|| 2 + - 

fi fi 



where the first inequality is due to Lemma 

Combining the above inequalities shows that 

1 



WfA-f 



n+1 1 1 2 



< 2 I — - 1 ) ||(^-r)f + i||g|| 2 + — , 
fia J a fia 



so that 2(-^j — 1)<1 implies that 



ll/WT< (2(--l)) II^H 2 + c||e1 2 + c^-, 



fia 



where c < - 4 ol . 

6a— 2 — 



The theorem then follows by the bound 

11/ -/'n < 



2^-2) ||/^ + c||eT + c^ + ||/.-/|| 



which means that after n* 



-) 



ln(2/( M a)-2) 



iterations we have 



11/ -n <(c' 



0.5 




2/x' 



ll/Wll- 



(28) 



(29) 



(30) 



/II, (31) 



(32) 
□ 



4. Convergence 



Whilst we cannot yet show that the algorithm converges, we can show that it will 
'converge' to a neighborhood of ffL t . 
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Theorem 7. Under the assumptions of Theorem^ and using e n -projections in iteration 
n, where e n — > 0, then as n — ► oo 

ll4 t -rir-c||T(/-/^) + e|| 2 (33) 
Proof. Mirroring the derivation that let to (|29l) but replacing by t shows that 

\\fo Pt -f n+1 \\ 2 <2[ — -i) \\(C-n\\ 2 +hnf-C)+e\\ 2 +^.m 



If 2M- - 1) < 1, for each N and k > N, 

\\fo P t-f k \\ 2 

\ \ fc-A 7 



HH^JJ ii/t.-/"r^iw-/y^n 2 +v (35) 

where again c < 3a ^ 2 1_ . Because \\fo pt — f N \\ 2 is bounded from above by (j3"5j) . so that 
in the limit — > oo 



ll/it - /II 2 - c||T(/ - /Jj + ef + (36) 



p<5 fJfc||2 . n \\rri f fS \ i „||2 , ^ e iV 

'2// 

and this holds for all > 0, so that the result follows by letting iV — > oo. □ 



References 

[1] H. W. Engl, M. Hanke, and A. Neubauer, Regularization of Inverse Problems, Kluwer Academic 
Publishers, 2000. 

[2] D. Donoho, "Compressed sensing," IEEE Trans, on Information Theory, vol. 52, no. 4, pp. 
1289-1306, 2006. 

[3] E. Candes, "The restricted isometry property and its implications for compressed sensing," Tech. 
Rep., California Institute of Technology, 2008. 

[4] D. Needell and J. Tropp, "COSAMP: Iterative signal recovery from incomplete and inaccurate 
samples.," to appear in Applied Computational Harmonic Analysis, 2008. 

[5] T. Blumensath and M. Davies, "Iterative hard thresholding for compressed sensing," to appear 
in Applied and Computational Harmonic Analysis, 2009. 

[6] Y. Lu and M. Do, "A theory for sampling signals from a union of subspaces," IEEE Transactions 
on Signal Processing, vol. 56, no. 6, pp. 2334-2345, 2008. 

[7] T. Blumensath and M. E. Davies, "Sampling theorems for signals form the union of finite- 
dimensional subspaces," IEEE Transactions on Information Theory, vol. 55, no. 4, pp. 1872- 
1882, 2009. 

[8] B. K. Natarajan, "Sparse approximate solutions to linear systems," SI AM Journal on Computing, 

vol. 24, no. 2, pp. 227-234, Apr 1995. 
[9] J. M. Lee, Introduction to smooth manifolds, Springer, 2000. 
[10] T. Sauer, J. A. Yorke, and M. Casdagli, "Embedology," Journal of Statistical Physics, vol. 65, 

no. 3/4, pp. 579-616, 1991. 
[11] H. Movahedi-Lankarani and R. Wells, "On bi-Lipschitz embeddings," Portugaliae Mathematica, 

vol. 62, pp. 247-268, 2005. 
[12] B. Eicke, "Iteration methods for convexly constrained inverse problems in hilbert space," Numer. 
Fund. Anal, and Optimization and Engineering, vol. 13, no. 5&6, pp. 413-429, 1992. 



Non-convexly constrained linear inverse problems 



10 



[13] N. G. Kingsbury and T. H. Reeves, "Iterative image coding with overcomplete complex wavelet 
transforms," in Proc. Con], on Visual Communications and Image Processing, 2003. 

[14] T. Blumensath and M.E. Davies, "Iterative thresholding for sparse approximations," Journal of 
Fourier Analysis and Applications, vol. 14, no. 5, pp. 629-654, 2008. 

[15] R. Baraniuk, V. Cevher, M. Duarte, and C. Hegde, "Model-based compressive sensing," preprint, 
2008. 

[16] R. Garg and R. Khandekar, "Gradient descend with sparsification: An iterative algorithm 
for sparse recovery with restricted isometry property," in Proceedings of the 26th Annual 
International Conference on Machine Learning, 2009, pp. 337-344. 

[17] T. Blumensath "Sampling and reconstructing signals from a union of linear subspaces," 
\arXiv:0911.35Lfo l, 2009. 



